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Abstract 

We consider the problem of modeling, estimating, and controlling the latent state of a spa¬ 
tiotemporally evolving continuous function using very few sensor measurements and actuator 
locations. Our solution to the problem consists of two parts: a predictive model of functional 
evolution, and feedback based estimator and controllers that can robustly recover the state of 
the model and drive it to a desired function. We show that layering a dynamical systems prior 
over temporal evolution of weights of a kernel model is a valid approach to spatiotemporal mod¬ 
eling that leads to systems theoretic, control-usable, predictive models. We provide sufficient 
conditions on the number of sensors and actuators required to guarantee observability and con¬ 
trollability. The approach is validated on a large real dataset, and in simulation for the control 
of spatiotemporally evolving function. 


1 Introduction 

Modeling, control, and estimation of spatiotemporally varying systems is a challenging area in 
controls research. These systems are characterized by dynamic evolution in both the spatial and 
temporal variables. Some examples of relevant problems include active wing-shaping based control 
of flexible aircraft, control of heat or particulate diffusion in manufacturing processes, control of 
rumor spreading across a social network, and tactical asset allocation and control problems in dy¬ 
namically varying battlespaces. The traditional approach to modeling and control of spatiotemporal 
systems have relied on Partial Differential Equations (PDEs) [T], solutions to which are functions 
that evolve in both space and time. However, PDE models can be limited in situations where 
exact physics based models of the functional evolution are difficult to formulate, or are inherently 
limited due to the physical understanding of the process or unknown spatiotemporal interactions 
[3]. Furthermore, the control of PDEs is fundamentally more challenging than the control of finite¬ 
dimensional state-space systems because the evolution and control spaces are infinite dimensional 
Hilbert spaces, as opposed to [I]. 

Accordingly, there has been significant work in approximate modeling of spatiotemporally evolv¬ 
ing functions using data-driven or distributed parameter based approximations of PDEs HUE]. One 
way to model spatiotemporally evolving functions is to approximate the function at several sampling 
locations and build an autoregressive model of the evolution of the function’s output over that grid 
[2] . The fidelity of these models heavily depends on the number of sampling (equivalently Euclidean 
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grid locations in the independent variable space) locations employed, with a large number of grid 
locations leading to large-scale state-space models that are difficult to manage. An alternative ap¬ 
proach to modeling spatiotemporal functional evolution relies on modeling the correlation between 
any two sampling locations through a smooth covariance kernel [3]. The model of the evolution is 
then formed through a linear, weighted combination of the kernels, and the hyperparameters of the 
spatiotemporal covariance kernel and the weights are learned by solving an optimization problem. 
The power and flexibility of this approach lies in the fact that kernels can be defined over abstract 
objects, and not just Euclidean grid locations, leading to a modeling technique that is domain ag¬ 
nostic. For example, kernel embeddings are available for graphical models studied in decentralized 
control [8], images and many other domains. However, formulating control-usable kernel-based 
models of spatiotemporal phenomena can be challenging due to the need to take into account the 
spatiotemporal dependence. Many recent techniques in spatiotemporal modeling have focused on 
covariance kernel design and associated hyperparameter learning algorithms 13 El mi 13]. The 
main benefit of careful design of covariance kernels over approaches that simply include time in as 
an additional input variable mm is that they can account for intricate spatiotemopral couplings. 
However, there are two key challenges with these approaches: the first challenge is in ensuring 
the scalability of the model to large scale phenomena. This is difficult due to the fact that the 
hyperparameter optimization problem is not convex in general, and because when time is used 
as a kernel input, it is nontrivial to restrict the number of kernels used without losing modeling 
fidelity mum- The second very important challenge is concerned with the formulation of feasible 
control strategies utilizing predictive kernel-based models of spatiotemporal phenomena. In partic¬ 
ular, when the spatiotemporal evolution is embedded in the design of complex covariance kernel, 
the resulting model of functional evolution can be highly nonlinear and difficult to utilize in control 
design. 

In this paper, we pursue an alternative systems-theoretic approach to the modeling, control, 
and estimation of spatiotemporally varying functions that fuses the strengths of kernel methods 
with systems theory. Our main contribution is to provide a systems-theoretic formulation for 
approximating, with very high accuracy, spatiotemporal functional evolution by layering a linear 
dynamical systems prior over temporal evolution of weights of a kernel model. For a class of linearly 
evolving PDFs, such as the heat diffusion and the wave equation, our approach can lead to a very 
high-accuracy approximation. This modeling approach is also applicable to data-driven modeling of 
real-world phenomena, which we demonstrate on a challenging inference problem on satellite data of 
sea surface temperatures. One benefit of our model is that it can encode spatiotemporal evolution 
of complex nonlinear surfaces through an Ordinary Differential Equation (ODE) evolving in a 
Hilbert space induced by the specific kernel choice. Yet, the main benefit of our systems-theoretic 
approach is that it is highly conducive to control synthesis. To illustrate this fact, we demonstrate 
that feasible control strategies for a class of spatiotemporally evolving systems can be found using 
linear control synthesis. In particular, we derive sufficient conditions on the kernel selection to 
guarantee observability and controllability of the presented model. Furthermore, we demonstrate 
control synthesis for a diffusion PDF using simple Gaussian kernels distributed uniformly in the 
input domain. 

The outline of this paper is as follows, Section focuses on the development of a systems- 
theoretic kernel-based model of spatiotemporal evolution. Section 2.2 presents the main theoretical 
results. Section presents modeling results on a real-world large dataset and control synthesis 
results for a diffusion PDF. 
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2 Kernel Controllers 


This section outlines our modeling framework and presents theoretical results associated with the 
number of sampling locations required for monitoring functional evolution. 

2.1 Problem Formulation 

We focus on predictive inference and control over a time-varying stochastic process, whose mean / 
is temporally evolving: 


fk+i (1) 

where F is a distribution varying with time t and exogenous inputs ij. The theory of reproducing ker¬ 
nel Hilbert spaces (RKHSs) provides powerful tools for generating flexible classes of functions with 
relative ease, and is thus a natural choice for modeling complex spatial functions |15] . Therefore, 
our focus will be on spatiotemporally evolving kernel-based models, such as Gaussian Processes 
(GPs). In a kernel-based model, >-]Risa positive definite kernel on some compact 

domain H that models the covariance between any two points in the input space. A Mercer kernel 
m implies the existence of a smooth map : H —>■ Pf, where Ti is an RKHS with the property 

k{x, y) = ^p{y))n = •))> (2) 

There is a large body of literature on modeling spatiotemporal evolution in % muz!. A simple 
approach for spatiotemporal modeling is to utilize both spatial and temporal variables as inputs 
to the kernel mm- However, this technique leads to an ever-growing kernel dictionary, which is 
computationally taxing. Furthermore, constraining the dictionary size or utilizing a moving window 
will occlude the learning of long-term patterns. Periodic or nonstationary covariance functions and 
nonlinear transformations have been proposed to address this issue mm- Furthermore, work in 
the design of nonseparable and nonstationary covariance kernels seeks to design kernels optimized 
to environment-specific dynamics, and optimize their hyperparameters in local regions of the input 
space EiEiin]. The model of spatiotemporal functional evolution proposed in this paper builds 
on the idea that modeling the temporal evolution of mixing weights of a kernel model is a valid 
approach to spatiotemporal modeling. The key idea behind our approach is that the spatiotemporal 
evolution of a kernel-based model can be directly modeled by tracing the evolution of the mean 
embedded in a RKHS using switched ordinary differential equations (ODE) when the evolution is 
continuous, or switched difference equations when it is discrete (Figure [^. The advantage of this 
approach is that it allows us to utilize powerful ideas from systems theory for knowing necessary 
conditions for functional convergence; furthermore, it offers a natural framework for designing 
control mechanisms as well. In this paper, we restrict our attention to the class of functional 
evolutions F defined by linear Markovian transitions in an RKHS. While extension to the nonlinear 
case is possible (and non-trivial), it is not pursued in this paper to help ease the exposition of key 
ideas. Let y G be the measurements of the function available from N sensors, A : Ti ^ T-L he a 
linear transition operator in the RKHS T-L, and fC ^ be a linear measurement operator, the 
model for the infinite-dimensional functional evolution and measurement studied in this paper is: 

fk+i = Afk + Vk (3) 

yk = ICfk + Ck, (4) 

where rjk is a zero-mean stochastic process in T-L, and Ck is a Wiener process in For many 
kernels, the feature map ■0 is unknown, and therefore it is necessary to work in the dual space of T-L. 
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Figure 1: Two types of Hilbert space evolutions. Left: the model, represented by the functions 
rrii, switches discretely in the Hilbert space Right: the evolution of the function mt is smooth, 
represented by a solution to an ordinary differential equation in %. 

For concreteness, we work with an approximate space as follows: given points C = {ci,... ,cm}; 
Cj G H, we have a dictionary of atoms Fc = [V'(ci) • • • , V'(ci) £ the span of which is 

a strict subspace of the RKHS generated by the kernel. Formally, we have 

C I-7-"Re := span [-(/^(ci) ••• '^(cm)] C H. (5) 

This regime, which trades off the flexibility of a truly nonparametric approach for computational 
realizability, still allows for the representation of rich phenomena. Let N represent the number 
of sampling locations, and M be the number of bases generating l-Lc- Note that every function 
/ G %c has an expansion of the form 


M 


f{x) = y Wik{ci,x). 


i=l 


( 6 ) 


This expansion allows us to write the Wi coordinates in the dual space as vectors w G . We can 
show the relation of the function spaces to their Euclidean counterparts via commutative diagrams. 
Dehne W : lie as the operator that maps the coordinates Wi in ^ to vectors w G M^, 

and let W~^ : — )• Tic- Note that for hnite-dimensional spaces, this inverse map always exists. 

These definitions allow us to outline the relations between the dynamics operators A and A, and 


the measurement operators JC and K using the commutative diagrams in Figure 2(a) and Figure 
|2(b)| respectively. The finite-dimensional evolution equations equivalent to Q in the dual space 
can be formulated as 


Wk+i = Awk + Vk (7) 

yk = i^kWk + (k, ( 8 ) 

where we have matrices A G G the vectors Wk,w G M^, and we have slightly 

abused notation to let r]k and (k denote their Tic counterparts. Note that the measurement operator 
/C is simply a sampling of the function / at an arbitrary set of sensing locations X = {xi,..., xat}, 
where Xi G H: we will see how this affects the structure of Kk momentarily. 

The equations ^ suggest an immediate extension to functional control problems. Pick another 
dictionary of atoms Fd = , "ipidj) G Ti, dj G H, the span of which, denoted by 
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(a) Relationship between A and A 


(b) Relationship between 1C and K 


Hd 


B 


-He 


W 


w-i 


^ 

t) 

(c) Relationship between B and B 


Figure 2: Commutative diagrams between primal and dual spaces 


is a strict subspace of the RKHS % generated by the kernel. The functional evolution equation 
is then as follows: 


fk+l — -^fk + B6k + Tjk (9) 

t/fc = ACfc/fc + Cfe, (10) 


where the control functions 6^ evolve in T-Ld, and B : Hd ^ 'He- To derive the finite-dimensional 
equivalent of B, we have to work out the structure of the matrix B: since He is not, in general, 
isomorphic to 'R/j, this imposes strict restrictions on B. We derive B using least squares using the 
inner product of B. Let 5 = x), and let Be = [V’(ci) • • • '^(cm)] be the basis for 

Tic- Then the projection of <5 onto Be can be derived as 


(<5,'0(ci))'H 


k{di,ci) ■ 

•• k{di,ci) 


Wl 



k{di,CM) ■ 

•• k{di,CM) 


_Wi_ 




using the reproducing property. This derivation shows that the operator B = Keo £ R^^^, the 
kernel matrix between the data C generating the atoms Be of Be and the data D generating 
the atoms Bd of Bd- Using similar arguments, it can be shown that, given sensing locations 
X = {xi, X 2 ,..., xat}, Kf) G R^^^ is the kernel matrix between X and D. Thus the finite¬ 
dimensional evolution equations equivalent to ([^ are 

Wk = Awk +KeoWk ( 11 ) 

Vk = KkWk- ( 12 ) 


We pause here to point out just how flexible the kernel-based framework is. First of all, the choice 
of kernel completely determines the space B, which may allow wildly different functional outputs 
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(a) Gaussian 


(b) Laplacian 



(c) Periodic 


(d) Locally periodic 


Figure 3: One-dimensional function evolution over a fixed systems matrix A, initial condition wq 
and centers C, but with different kernels k(x, y). Each y-vector at a given value of x represents the 
output of the function which evolves from left to right. As can be seen, changing the kernel creates 
quite different behavior for the same system. 


for the same dynamics matrix, as shown in Figure Note also that the dynamical equations 
0 and ( |12[ ) are independent of the choice of domain Q: different domains with different kernels 
may result in the same sequence of matrices K]^. This allows our results to hold for any domain 
over which a kernel can be defined, including examples like graphs, hidden Markov models, and 
strings, which are not typically studied in the controls literature, at virtually no extra complexity 
in implementation beyond the design of the actual sensors and actuators. This remarkable fact is 
why we denote our method to be domain agnostic. 

Since Kk+i is the kernel matrix between the data points and basis vectors, its rows are of the 
form = \k{xi,ci) k{xi,C 2 ) ■■■ k{xi,CM)]- In systems-theoretic language, each row of the 
kernel matrix corresponds to a measurement at a particular location, and the matrix itself acts as 
a measurement operator. We define the generalized observability matrix |18j as 


Ox 




(13) 


where T = are the set of instances U when we apply the measurement operators 

At-. Note that Ox £ Similarly, we can define the generalized controllability matrix as 


'I'x 




(14) 
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'I'x £ A linear system is said to be observable if Oy has full column rank (i.e. Rank Oy = 

M) and is controllable if 'I'x has full row rank, for T = {0, 1 ,..., M — 1} [T5] . 

Observability guarantees that a feedback-based observer can be designed such that the estimate 
of w denoted by vS^ converges exponentially fast to the true state Wk- In particular, observability 
is the necessary condition for the existence of a unique solution to the Riccatti equation required 
in designing a Kalman filter. Therefore, when r], ( have a zero mean Gaussian distribution, a 
Bayes optimal filter can be designed for estimating w if and only if Rank Ox = M. Similarly, 
controllability guarantees that a feedback-based controller can drive the current functional state of 
the system fk to a reference function /ref, as long as /ref G Tic- 

We are now in a position to formally state the spatiotemporal monitoring and control problem 
considered: Given a spatiotemporally evolving system modeled using choose a set of N sensing 
locations X = {xi,..., xn} and i actuating locations V = {di, ..., di} such that even with N M 
and i <C M, the functional evolution of the spatiotemporal model can be estimated robustly, and 
driven (controlled) to a reference function /ref. Our approach to solve this problem relies on the 
design of the measurement operator K such that the pair [A, K) is observable, and the control 
operator Kd such that the pair {A^Kd) is controllable. 


2.2 Theoretical Results 


In this section, we prove results concerning the observability of spatiotemporally varying functions 
modeled by the functional evolution and measurement equations ([^ and Q formulated in Section 
In particular, observability of the system states implies that we can recover the current state 


2.1 


of the spatiotemporally varying function using a small number of sampling locations N, which 
allows us to 1) track the function, and 2) predict its evolution forward in time. It should be noted 


that the results are also applicable to controllability of the system in (12) since the structure of 


the control matrix Kcd is also that of a Kernel matrix. We first show in Proposition 2.1 that if 
A has a full-rank Jordan decomposition, the kernel matrix meeting a condition called shadedness 
(to be defined below) is sufficient for the system to be observable. In Proposition 2.2 we prove a 
lower bound on the number of sampling locations required for observability which holds for more 
general A. Finally, in Proposition 2.3 we outline a method that achieves this lower bound for 


certain kernels. Since both K and Kcd are kernel matrices generated from a shared kernel, these 
observability results translate directly into controllability results. 

To prove our results, we will leverage the spectral decomposition of A. Specifically, recall that 
any matrix A G ig similar to a unique block diagonal matrix A (i.e. G invertible 

such that A = PAP~^) whose diagonal blocks are matrices of the form 


Afc(A„A:) := 


M h 


0 0 


0 

h 

M 


(15) 


where (Aj, A*) is a complex conjugate eigenvalue of A, and M = 


and I 2 = 


. Real 


1 0 
0 1 

ete real Jordan form of 


Ail A2 
-A*2 

eigenvalues A* correspond to the case M = A* and I 2 = 1. Thus the comp 
A will be the appropriate diagonal array of these blocks. If all the eigenvalues A* are nonzero and 
real, we say the matrix has a full-rank Jordan decomposition. 

Definition 2.1. (Shaded Kernel Matrix) Let k : Q x Q ^ M. be a positive-definite kernel on a 
eompact domain Q. Let C = [ci, C 2 , • • • , cm}, Cj £ Q be the points generating a finite-dimensional 
covering of the reproducing kernel Hilbert space H assoeiated to k{x,y), and let X = {xi,... ,xn}, 
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Xi £ Q. Let K G -^NxM kernel matrix, where Kij := k{xi,Cj). For each row := 

[k{xi,ci),k{xi,C 2 )-, ■ ■ ■ ■,k{xi,CM)], define the set := 4* \ ■ • •) indices in the 

kernel matrix row i which are nonzero. Then if 

IJ XW = {1,2,...,M}, ( 16 ) 

l<i<Af 

we denote K as a shaded kernel matrix (see figure^. 

This condition implies that the null space of the adjoint of X as a linear operator between 
Euclidean spaces, i.e. —)• is trivial. Note that, in principle, for the Gaussian kernel, 

a single row generates a shaded kernel matrix, although this matrix can have many entries that 
are extremely close to zero. With this definition in place, we can prove the following proposition, 
which shows that if A has a full-rank Jordan decomposition, a shaded kernel matrix is sufficient to 
prove observability. 

Proposition 2.1. Let k : kl x LI ^ M. be a positive definite kernel on a domain 12. Let C = 
[ci,C 2 ,--- ,cm}, Cj £ Ll be the points generating a finite-dimensional covering of the reproducing 
kernel Hilbert space H associated to k{x,y), and consider the discrete linear system on H given 
by the evolution and measurement equations and Let A G 5 g Q full-rank Jordan 

decomposition of the form A = PKP~^, where A = diag([Ai A 2 ••• Ao]), and there are no 

repeated eigenvalues. Given a set of time instances T = {ti,t 2 ,... Hl}, and a set of sampling 
locations X = {xi,..., xat}, the system 0 is observable if the kernel matrix Kij := k{xi,Cj) is 
shaded, K^, the row vector generated by summing the rows of K, has all nonzero entries, T has 
distinct values, and |T| > M. 

Proof. To begin, consider a system where A = A, with Jordan blocks {Ai, A 2 ,..., Ao} along the 
diagonal. Then = diag([A^* A 2 ••• A^]). We have that 




■A*i 

O' 




0 

... Ag 

Or = 


= [K ■■■ K] 



^ V 

A}^ 

0 


s. 

_ 0 

... 


AglRMixM 


Recall that a matrix’s rank is preserved under a product with an invertible matrix. Design a matrix 
U G s.t. K := UK is a matrix with one row vector of nonzeros, and all of the remaining 

rows as zeros. Then rank(KA) = rank([/KA). Therefore, we have that 




■feiiA*/ 


• kiM><o 

0 

= 

0 

0 

0 





0 

0 


0 

0 

0 
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KA^^ = 
















Therefore, following some more elementary row operations encoded by y G '^mlxML^ that 






'hiX'f • 
~kii\{^ • 

• kiM^o 

■ kiM^o 


$ 

K ■ 

■ K 



1 

0 

0 

klM^O 

0 


0 


If the individual entries ku are nonzero, and the Jordan block diagonals have nonzero eigenvalues, 
the columns of $ become linearly independent. Therefore, if L > M, the column rank of Ot is M, 
which results in an observable system. 

To extend this proof to matrices A = PKP~^, note that 


O'Y = 




'APA^ip-i' 

_KA^i'_ 


_KPA^i'P-K_ 


= [K • • • K] PA^P- 


where P G A* G a,nd G are the block diagonal matrices 

associated with the system. Since P is an invertible matrix, the conclusions about the column rank 
drawn before still hold, and the system is observable. □ 


When the eigenvalues of the system matrix are repeated, it is not enough for K to be shaded. 
The next proposition proves a lower bound on the number of observations required. 


Proposition 2.2. Suppose that the conditions in Proposition 2.1 hold, with the relaxation that the 
Jordan blocks [Ai A 2 • • • Ao] may have repeated eigenvalues. Let r he the number of unique 
eigenvalues of A, and let 7 (Aj) denote the geometric multiplicity of eigenvalue A*. Then there exist 
kernels k{x, y) such that the lower bound I on the number of sampling locations N is given by the 
cyclic index of A, which can he computed as 


I = max 7(Ai). 

l< 2 <r 


(17) 


Proof. We first prove the lower bound. WLOG, let K have I — 1 fully shaded, linearly independent 
rows, and write it as 


K = 


kn 


k 


12 


k 


(1-1)1 


k, 


kiM 


k(l-l)M, 


''{1-1)2 ■ 

Since the cyclic index is I, this implies that at least one eigenvalue, say A, has I Jordan blocks. 
Define indices ji,j2,---,ji £ {1; 2,..., M} as the columns corresponding to the leading entries of 
the I Jordan blocks corresponding to A. WLOG, let ji = 1. Using ideas similar to the last proof, 
we can write the observability matrix as 

kiiX^^ ■ ■ ■ kijjX^^ 


(Dy '■= 


fciiA*^ A:ij,A* 


fc(z-i)iA*^ • • • k(i_i)j,X''^ 


d-m 
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Define A := [A*^ A*^ • • • A^^ 

rp 

] . Then the above matrix becomes 


kiiX 

kij2^ 

kiji 

Ox •= 





1 

■ ■ 

k{i-i)ji 


A 


We need to show that one of the columns above can be written in terms of the others. This is 
equivalent to solving the linear system 


1- 

1_ 


kij2 

kiji 


Cl 

k2h 

= 

k2j2 

k2ji 


C2 

1- 

? . 

1_ 


1 

? . 

to 



TO-1). 


Suppose the kernel matrix on the RHS is generated from the Gaussian kernel. From [TO], it’s known 
that every principal minor of a Gaussian kernel matrix is invertible, which implies that Ox cannot 
be observable. □ 


We now prove a sufficient condition for the observability of a system with repeated eigenvalues, 
but with the condition that the Jordan blocks are trivial. 


Proposition 2.3. Suppose that the conditions in Proposition 2.1 hold, with the relaxation that 
the Jordan blocks [Ai A 2 • • • Ao] naay have repeated eigenvalues, and where A, are single¬ 
dimensional. Let I be the cyclic index of A. We define 


K = 


■rW 

rJ) 


(18) 


as the /-shaded matrix which consists of I shaded matrices with the property that any subset of I 
columns in the matrix are linearly independent from each other. Then system & is observable if 
T has distinct values, and |T| > M. 


Proof. A cyclic index of I for this system implies that there exists an eigenvalue A that’s repeated I 
times. WLOG, let K have I fully shaded, linearly independent rows, and, assume that the column 
indices corresponding to this eigenvalue are {1,2,...,/}. Define A* := A*^ • • • A^^] . Then 


Ox ■— 


/ciiAi /C12A2 

kii^i ki2\2 


klM^M 

klM^M- 


Let Ai = A 2 = • • • A; := A. Focusing on these first / columns of this matrix, this implies that we 
need to find constants ci, C 2 ,..., c;_i s.t. 


kii 


ki2 


kii 


= Cl 


H-h Q_1 


_kii_ 


_ki2_ 


kii._ 


However, these columns are linearly independent by assumption, and thus no such constants exist, 
implying that Ox is observable. □ 
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Figure 4: Pictorial representations of shaded kernel matrices. 


Algorithm 1 Kernel Observer (Transition Learning) 

Input: Kernel k, basis points C, final time step Tf. 
while k < Tf do 

1) Sample data {yk}i^i from f{x, k). 

2) Estimate Wk via standard kernel inference procedure. 

3) Store weights Wk in matrix W € . 

end while 

Infer A using method of choice (e.g. matrix least squares). Compute the covariance matrix B of the observed 
weights W. 

Output: estimated transition matrix A, predictive covariance matrix B. 


Algorithm 2 Kernel Observer (Estimation and Prediction) 


Input: Kernel k, basis points C, estimated system matrix A, estimated covariance ma trix B. 

Compute Observation Matrix: Compute the cyclic index I of A, and compute (181, by possibly iterating over 

T = {*1,... jiiv}. 

Initialize Observer: Use A, B, and K to initialize a state-observer (e.g. Kalman filter (KF)) on Be- 
while measurements available do 

1) Sample data {yD^i from f{x, k). 

2) Propagate KF estimate Wk+i forward to time tf, correct using measurement feedback with 

3) Output predicted function f{x,k + 1) and predictive covariance of KF. 

end while 


An example of a kernel such that any subset of I columns in K are linearly independent of each 
other is the Gaussian kernel evaluated on sampling locations {xi,..., xat}, where Xj G n C and 
Xj ^ Xj. 

We can reuse Propositions 2.1, 2.2t and |2.3| to prove kernel controllability results, because the 
structure of the control matrix Kqd in (0 is also that of a kernel matrix. 


3 Experimental Results 

We report experimental results on controlling synthetic and modeling real-world data. All experi¬ 
ments were performed using MATLAB on a laptop running Ubuntu 14.04 with 8 GB of RAM, and 
an Intel core i7 processor. 
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Algorithm 3 Kernel Controller 

Input: Kernel k, basis points C, estimated system matrix A, estimated covariance matrix B, and function /ref to 
drive initial function to. 

Initialize Observer: (see Algorithm]^. 

Initialize Controller: Use Jordan decomposition of A to obtain control locations T>, compute kernel matrix 
Kcd € between T) and C, and initialize controller (e.g. LQR) utilizing {A,B). 

while measurements available do 

1) Sample data {j/fejiLi from f{x, k). 

2) Utilize observer to estimate Wk+\- 

3) Use Wk+i and /ref as input to controller to get feedback. 

end while 


3.1 Prediction of global ocean surface temperature 


We first analyzed the feasibility of this modeling approach on a large dataset: the 4 km AVHRR 
Pathfinder project, which is a satellite monitoring global ocean surface temperature. This data 
was obtained from the National Oceanographic Data Center. The data consists of longitude- 
latitude measurements on a 2D domain D C [—180,180] x [—90,90]; this dataset is challenging, 
with measurements at over 37 million coordinates, and several missing pieces of data. The goal 
was to learn the day and night temperature models fk{x,y) G Tic, where He was generated using 
the Gaussian kernel k{x,y) = \ We first did a search for the ideal bandwidth a for 

a 304-dimensional sparse Gaussian process model with a Gaussian kernel. The set of atoms J-c 
was determined through a linear independence test based sparsification algorithm [5]. Once the 
parameters were chosen, a budgeted GP was learned for each date, resulting in w eig ht vectors 
i G {1, 2,..., 365}. We used Algorithm to infer A, and applied Algorithm [^with N G 
{280, 500,1000, 2000} chosen randomly in the D to track the system state given a random initial 
condition wq. Figures 6(a) and 6(c) show a comparison of the deviation in percentage of the 
estimated values from the real data, averaged over all the days. As can be seen, the observer 
enables the prediction of functional evolution without needing all the measurements (37 million), 
and performance comparable to sampling over all locations is obtained with sampling only over 
2,000 locations. Note that here, even though the system model is observable at N = 280, since 
the dynamics are not truly linear in He, we get better performance with more sampling locations. 
Finally, |6(b)| and |6(d)| show that the time required to estimate the state during function tracking 
with kernel observer are an order of magnitude better than retraining the model every time step 
(“original” in the figure), with comparable performance. 


3.2 Control of a linear PDE 

We then employed kernel controllers for controlling an approximation to the scalar diffusion equa¬ 
tion ut = buxx on the domain D = [0,1], with b = 0.25. The solution to this equation is infinite¬ 
dimensional, so we chose a kernel k{x,y) = \ and a set of atoms He = (ci,... ,cm}, 

Ci G D, with M = 25 generating He, the space approximating H, and another set of atoms 
= {^/’(di)j ■ • • ,'fpidi)}, dj £ Q, i = 13, generating the control space Hd- The number of, and 
the location of the observations was chosen to be the same as that of the actuation locations dj. 
First, tests (not reported here) were conducted to ensure that the solution t o the diffusion equation 
is well approximated in He- Algorithmwas then used to infer A. Figure 7(a) shows an example 


of an initial function /init evolving according to the PDE. A reference function /j-gf G He was chosen 
to drive /init to /ref under the action of the PDE. Finally, Algorithm]^ was used to control the PDE. 
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(a) Pathfinder raw data on a fixed daty 


(b) Pathfinder kernel observer estimate 


Figure 5: Pathfinder raw data and kernel observer estimate, computed on data from 05/01/2012. 

Figure [7(b) | shows /init being driven to /ref, while Figure 7(c) shows the absolute value of the error 
between and /ref as a function of time. 


4 Conclusions 

In this paper we presented a systems theoretic approach to the problem of modeling, estimating, and 
controlling complex spatiotemporally evolving phenomena. Our approach focused on developing a 
predictive model of spatiotemporal evolution by layering a dynamical systems prior over temporal 
evolution of weights of a kernel model. The resulting model can approximate PDF evolution, while 
it has the form of a finite state linear dynamical system. The lower bounds on the number of 
sampling and actuation locations provided in this paper are non-conservative, as such they provide 
direct guidance in ensuring robust real-world sensor network and actuation matrix design that must 
also account for fault-tolerance and reliability considerations. 
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